166 research outputs found

    A Pan-cancer Somatic Mutation Embedding using Autoencoders

    Get PDF
    Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentin

    Feature extraction and selection using statistical dependence criteria

    Get PDF
    Dimensionality reduction using feature extraction and selection approaches is a common stage of many regression and classification tasks. In recent years there have been significant e orts to reduce the dimension of the feature space without lossing information that is relevant for prediction. This objective can be cast into a conditional independence condition between the response or class labels and the transformed features. Building on this, in this work we use measures of statistical dependence to estimate a lower-dimensional linear subspace of the features that retains the su cient information. Unlike likelihood-based and many momentbased methods, the proposed approach is semi-parametric and does not require model assumptions on the data. A regularized version to achieve simultaneous variable selection is presented too. Experiments with simulated data show that the performance of the proposed method compares favorably to well-known linear dimension reduction techniques.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Feature extraction and selection using statistical dependence criteria

    Get PDF
    Dimensionality reduction using feature extraction and selection approaches is a common stage of many regression and classification tasks. In recent years there have been significant e orts to reduce the dimension of the feature space without lossing information that is relevant for prediction. This objective can be cast into a conditional independence condition between the response or class labels and the transformed features. Building on this, in this work we use measures of statistical dependence to estimate a lower-dimensional linear subspace of the features that retains the su cient information. Unlike likelihood-based and many momentbased methods, the proposed approach is semi-parametric and does not require model assumptions on the data. A regularized version to achieve simultaneous variable selection is presented too. Experiments with simulated data show that the performance of the proposed method compares favorably to well-known linear dimension reduction techniques.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections

    Get PDF
    Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers

    Learning Kernels from genetic profiles to discriminate tumor subtypes

    Get PDF
    Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ

    Learning Kernels from genetic profiles to discriminate tumor subtypes

    Get PDF
    Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ

    Learning Kernels from genetic profiles to discriminate tumor subtypes

    Get PDF
    Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ

    Extraction d'attributs discriminants par optimisation de fonctions paramétrées

    Get PDF
    Une méthode est proposée pour extraire automatiquement des attributs discriminants dans le cas d'un processus décrit à l'aide d'une base d'exemples étiquetés. Les attributs sont sélectionnés, à l'aide de familles de fonctions paramétrées, en déterminant les paramètres optimaux par rapport à un critère de séparabilité des classes. Les fonctions paramétrées choisies mesurent des caractéristiques correspondant aux moments d'ordre 0 ou 1 d'une représentation uni- ou bi-dimensionnelle pondérée. L'aspect continu des fonctions paramétrées permet d'explorer un ensemble infini d'attributs et d'éviter de traiter un problème de complexité combinatoire. Le critère mesurant la séparabilité des classes est basé sur les matrices de dispersion, et permet la sélection conjointe d'attributs. L'élaboration d'un classifieur linéaire, adapté aux attributs extraits est proposé. La méthode est appliquée à des signaux simulés décrits par leur représentation temporelle

    Classification basée sur l'extraction conjointe d'attributs dans le plan temps-fréquence selon un critère d'information mutuelle

    Get PDF
    - La méthode proposée concerne la classification de signaux basée sur une extraction automatique et conjointe d'attributs, dans le cas de processus non stationnaires uniquement décrits à l'aide d'une base d'exemples étiquetés. Les attributs sont définis par le résultat de transformations appliquées à la distribution du Wigner-Ville du signal à classer. Chaque transformation est sélectionnée au sein d'une famille de transformations paramétrées. Les valeurs des paramètres sont optimisées afin de maximiser l'information discriminante portée conjointement par les attributs, compte tenu d'une base d'apprentissage. L'information discriminante est mesurée à l'aide d'un critère d'information mutuelle, basé sur l'estimation des lois de distribution conjointes des attributs. Afin de prendre en compte toute l'information portée par les attributs et d'assurer une cohérence avec la phase d'extraction, le classifieur utilise également l'estimation des lois de probabilités. Le principe obtenu présente l'intérêt de ne faire aucune hypothèse sur les lois suivies par les attributs conditionnellement à chacune des classes. La méthode a été appliquée à un problème de classification de signaux de l'électroencéphalogramme du sommeil. De bonnes performances ont été obtenues à partir de l'extraction conjointe de deux attributs
    corecore